This document accompanies the replication data and code for "Do Primary Elections Exacerbate Congressional Polarization?" by Anthony Fowler and Shu Fu. Please contact Anthony Fowler (anthony.fowler@uchicago.edu) with any questions.

All analyses were conducted using Stata 19.5.

Stata 18 (or previous) will not work because the areg command was not able to accommodate two sets of fixed effects. To replicate the results using a previous version of Stata, you can use the reghdfe command, but the estimated standard errors will be slightly different, and if you do not have a powerful computer with sufficient working memory, you won't be able to estimate standard errors at all.

The file "code.do" contains all code necessary to replicate all tables and figures in the paper and online appendix.

Running all of the analyses may require a relatively powerful computer or an external server. The replication code typically deletes unnecessary variables before running regressions. This is not necessary for reproducing the results, but it may reduce computational time or help to avoid situations where a user's computer does not have sufficient memory.

One of the authors ran the code in its entirety on a desktop computer (with 128 GB of RAM) using the MP 8-core version of Stata, and it took approximately 72 minutes. Here is a quick breakdown of the time it took to complete each step:
Compiling House data (including the estimation of CVP scores and classification of bills): 30 minutes
Compiling Senate data (including the estimation of CVP scores and classification of bills): 2 minutes
Table 1: 1 minute
Figure 1: <1 minute
Table 2: 1 minute
Figure 2: <1 minute
Table 3: 6 minutes
Figure A1: 1 minute
Table A1: 1 minute
Table A2: 1 minute
Table A3: 1 minute
Figure A2: 1 minute
Table A4: 2 minutes
Table A5: 17 minutes
Figures A3-A5 and Table A6: 2 minutes
Figure A6: 6 minutes

Most of the analyses utilize the data sets "CleanedData_House.dta" and "CleanedData_Senate.dta". The following data sets are necessary to reconstruct these data sets:

HSall_votes.csv: vote-level data from Voteview.com
HSall_rollcalls.csv: bill-level data from Voteview.com
HSall_members.csv: member-level data from Voteview.com
Hall_rollcalls_issues.csv: issue classifications of bills from Voteview.com
dime_recipients_1979_2020.csv: data on campaign contributions from the Database on Ideology, Money in Politics, and Elections
prioritybills.dta: Party priority legislation as classified by Curry and Lee (2020)
primarydates.dta: Primary election dates, collected from the FEC
filingdates.dta: Candidate filing deadlines, collected from the FEC
PrimaryChallenges_1996_2022.dta: data on retirements and contested primaries, collected from official election returns, Wikipedia, and Ballotpedia
contested_senate.dta: information on contested primaries and retirements in the Senate, also collected from official election returns, Wikipedia, and Ballotpedia
GenElecDates.dta: general election dates
Pres_CD_formerge.dta: presidential two-party vote share by congressional district and redistricting cycle, 1952-2020
Pres_State_1916_2020.dta: presidential vote totals by state, 1916-2020

Codebook for "CleanedData_House.dta" and "CleanedData_Senate.dta":
Each row in these data-sets corresponds to a member-vote. Each data set contains the following variables:

congress: Congress number 
bill: bill number 
icpsr: unique id for each member
yea: indicator for whether the member voted yes on that particular bill/vote
abstain: indicator for whether the member abstained on that vote
eyear: the upcoming election year
state: postal code for member's state
name: name of member
dem: 1 for Democrats, 0 for Republicans
contested: indicator for whether the member was contested in their primary election
retiring: indicator for whether the member did not seek reelection at the end of the current term
lostprimary: indicator for whether the member an in and lost their primary election
prop_yea: share of voting members who voted yea on that particular bill
priority: indicator or party-priority legislation
nonprocedural: indicator for non-procedural bills
rcday: day of the vote (all dates are measured as the number of days after January 1, 1960)
pday: day of the member's primary election
fday: day of the challenger filing deadline
afterprimary: indicator for whether the vote is taking place after the primary election date
afterfiling: indicator for whether the vote is taking place after the filing deadline
withparty: indicator for whether the member voted with the majority of their party on this particular bill
viablechallenger: indicator for whether the member faced a primary challenger who raised more than $50,000 before the primary election date
extremechallenger: indicator for whether the member's highest performing primary challenger is more extreme than them according to Bonica's estimates from campaign contributions
born: member's year of birth
died: member's year of death
experience: number of Congresses that the member has served (counting the current Congress)
purple: indicator for whether a member's constituency was above the 25th percentile and below the 75th percentile in two-party presidential vote share in the most recent presidential election
conservative: indicator for whether a member's vote on this particular bill is classified as in the ideologically conservative direction
cvp: CVP score for each member-Congress, computed using only votes taken before the first filing deadline in any state
moderate: indicator for whether a member's CVP score is more moderate (conservative for Democrats and liberal for Republicans) than the median member of their party in that Congress
extreme: indicator for whether the member's vote on this particular bill is classified as ideologically extreme (i.e., conservative for Republicans and liberal for Democrats)
billid: unique indicator for each bill-Congress-chamber
bill_party: unique indicator for each bill-Congress-chamber-party
member_cong: unique indicator for each member-Congress

The following variables are in only the House data set:
dist: member's district number
agriculture: indicator that the bill pertains to agriculture 
civilliberties: indicator that the bill pertains to civil liberties
defense: indicator that the bill pertains to defense
government: indicator that the bill pertains to government management
welfare: indicator that the bill pertains to welfare

The following variable is in only the Senate data set:
reelection: indicator for whether a member's seat is up for reelection at the end of the current Congress

The "pres" variable is in both data sets but is coded differently:
pres in House: two-party Republican vote share in the member's district in the most recent presidential election minus the national two-party Republican vote share
pres in Senate: two-party Democratic vote share in the member's state in the most recent presidential election

 





